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The ultimate bound to the accuracy of phase estimates is often assumed to be given by the Heisen- 
berg limit. Recent work seemed to indicate that this bound can be violated, yielding measurements 
with much higher accuracy than was previously expected. The Heisenberg limit can be restored as 
a rigorous bound to the accuracy provided one considers the accuracy averaged over the possible 
values of the unknown phase, as we have recently shown [Phys. Rev. A 85, 041802(R) (2012)]. Here 
we present an expanded proof of this result together with a number of additional results, including 
the proof of a previously conjectured stronger bound in the asymptotic limit. Other measures of the 
accuracy are examined, as well as other restrictions on the generator of the phase shifts. We provide 
expanded numerical results for the minimum error and asymptotic expansions. The significance of 
the results claiming violation of the Heisenberg limit is assessed, followed by a detailed discussion 
of the limitations of the Cramer-Rao bound. 

PACS numbers: 42.50.St, 03.65.Ta, 06.20.Dk 
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I. INTRODUCTION 

Phase estimation is the basis for much precision mea- 
surement. Optical interferometers offer highly accurate 
measurements of length, and atomic phase measurements 
provide highly accurate measurements of time, as well as 
other physical quantities like magnetic field jlJ-Q . In op- 
tics, most measurements are limited by the shot-noise 
limit, where the accuracy scales as 1/ \/ (N), where N is 
the photon number operator. In contrast, it is normally 
assumed that the fundamental limit is the Heisenberg 
limit, where the accuracy scales as 1/(N) This po- 

tentially provides far greater accuracy, but is extremely 
difficult to achieve in practice because it requires highly 
nonclassical states of light, as well as arbitrarily high effi- 
ciencies 0-[H| ■ Any amount of loss will cause the scaling 
to revert to 1/yJJN) for large (N) [ll|. 

Recently a number of papers suggested that the 
Heisenberg limit is not the fundamental limit to accu- 
racy, and that a better scaling constant or even a higher 
power of (N) might be possible. In Ref. [l2[, Anisimov et 
al. gave a proposal for violating the Heisenberg limit by 
a small amount. In another work, Zhang et al. [l3j pro- 
posed a scheme offering zero phase uncertainty with finite 
(N) . Finally, in Ref. [1J] Rivas and Luis presented a pro- 
posal for obtaining scaling as 1/ (N) p for p > 1. A qual- 
itatively different proposal for violating the Heisenberg 
limit is that based on nonlinear interferometry [Tol [l6| . 
However, that work differs in its use of terminology; it 
does not violate the Heisenberg limit in the sense we use 
here (see Sec. lVIIDD Q. 

A common feature of proposals to violate the Heisen- 
berg limit is that they only work for a limited range of 
phases. Additional phase information would be needed to 
confine the phase to within the region where the measure- 
ment is accurate. One can consider first using a sequence 



of measurements to ensure that the phase lies within a 
suitable region, then using the super-Heisenberg mea- 
surement. If the overall measurement (consisting of the 
sequence of individual measurements) could yield better 
accuracy than the Heisenberg limit, then it could be re- 
garded as providing a true improvement. On the other 
hand, if the resources required to localise the phase to the 
required region result in an overall measurement with 
accuracy that is not better than the Heisenberg limit, 
then the accuracy of the super-Heisenberg measurements 
would seem to be illusory. 

An analogous situation was seen in considering the 
rcciprocal-peak-likelihood as a measure of uncertainty. 
In Ref. HH a technique was proposed that would 
apparently yield super-Heisenberg accuracy in terms of 
rcciprocal-peak-likelihood. Later work found that, in 
practice, the proposal resulted in accuracy that was worse 
than the Heisenberg limit Another example is that 
of NOON states. NOON states yield phase informa- 
tion scaling as the Heisenberg limit, but require initial 
phase information with similar accuracy. In that case, 
it is known how to combine measurements from multiple 
states to obtain an overall measurement that scales at 
the Heisenberg limit USUI. 

To evaluate whether the super-Heisenberg measure- 
ments would be able to yield an overall measurement vi- 
olating the Heisenberg limit, we examined the case where 
the mean-square error is averaged over all phase shifts. 
We showed that the Heisenberg limit provides a rigor- 
ous lower bound to the square root of the average mean- 
square error (RAMSE) in such a case [22;]. Therefore, no 
scheme that apparently beats the Heisenberg limit for a 
small range of phase could be used to construct an over- 
all measurement starting from an unknown phase that 
beats the Heisenberg limit. An alternative approach is 
to determine the bound if the initial phase is restricted 
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to a given range. In Ref. [23[ it was shown that with such 
a restriction the usual Heisenbcrg limit can be multiplied 
by a factor proportional to the phase range, and further 
results have been given in Refs. [24U271 ] . An alternative 
approach has yielded a bound on the average of the error 
at just two locations [28l |. 

The specific result from Ref. [22j is 



5<f> > 



(G + l)- 



The phase shift is described by the unitary opera- 
tor exp(— iG$). That is, the probe state po becomes 
/?0 := e~ lG $ 'p e lG ^ '. The detection method used to esti- 
mate 4> is described by a positive-operator valued mea- 
sure (POVM) {M^}. Hence, the probability distribu- 
tion is given by p{<fi\<fi) = Tr(M^p^). Because phase is 
only defined modulo 2n, we do not distinguish between 
(fi and <fi + 2tt, or between <fi and <fi + 2tt. This means that 
(1) p(4>\4>) = p(4> + 2,Trk\(f>) for any integer k, and p((fi\(fi) is 
normalised over a (arbitrary) 2tt interval. 



where S<t is the RAMSE, G is the generator of the phase 
shifts, which is here assumed to have nonnegative integer 
eigenvalues, and A: is a constant. These quantities are 
explained in Sec. [TT] below. We have analytical ly prov en 
that this inequality holds with k = Ua '■= -\/27r/e 3 



0.5593 [22|. In Scc.[Hl]we give the full proof of that result, 
as well as a generalised result in terms of the absolute 
value of G in the case where G also has negative integer 
eigenvalues. 

Numerical calculations indicate that the inequality 
holds with the larger scaling constant k = kc ~ 1.3761. 
We give the detailed numerical results in Sec. lIVi indicat- 
ing that this result holds both for the RAMSE and the 
error estimated using the Holcvo variance. In Sec. fVl we 
calculate asymptotic expansions for the RAMSE, provid- 
ing strong analytic support for the scaling constant kc, 
and proving that k = kc is valid in the asymptotic limit 
(G) — > 00. We examine the scaling with the number of 
probe states in Sec. I VI) then give a detailed discussion of 
the papers claiming violation of the Heisenberg limit in 
Scc. lVIIl The Cramer- Rao bound and the error propaga- 
tion formula are commonly used in examining the Heisen- 
berg limit, but have some limitations; these are discussed 
in Sec. IvTnl 



II. FIGURES OF MERIT FOR AVERAGE 
PHASE RESOLUTION 



There are a number of different figures of merit for 
phase measurements. Before describing these, we first 
introduce some notation, largely following Ref. [22| • The 
random variable for the phase shift of the system is $, 
and the random variable for the estimate of that phase 
shift is $. The error in the phase estimate is = $ — $. 
We use capital letters for the random variables; the corre- 
sponding values and measurement outcomes are denoted 
by the corresponding lower case letters {<fi, (fi, and 9). 

We consider a Hilbcrt space with a phase shift operator 
G. In the completely general case, the only restriction is 
that the eigenvalues of G must be integers. We may also 
consider the specific case where the eigenvalues are all 
nonnegative integers, in which case we denote the opera- 
tor by N. This includes, for example, the case of photon 
number. Alternatively, if the eigenvalues include all inte- 
gers, such as for angular momentum, we use the symbol 
J. 



A. Root-mean-square error 

The most common figure of merit for a measurement 
is the square root of the mean-square error (MSE). We 
will call this the RMSE. For a specific phase shift, <fi, the 
MSE is given by 



(At<|) 2 : = 



dH4>-4>fp{4>\<t>)- 



(2) 



There is a subtlety in that for phase, values that differ by 
27T are equivalent, which means that a range of 2tt must 
be specified for the integral. However, the reference phase 
shift, <fi r , is arbitrary, and the value that is obtained for 
the MSE will depend on <fi r . Ideally <fi should be near the 
centre of the range. If it is near one of the bounds of the 
range then the MSE will be unreasonably large. 

To solve this problem, it is convenient to take the dif- 
ference <fi—(fi modulo (— TT, 7r]. That is, we add or subtract 
a multiple of 2tt such that the value obtained is in the 
range (— tt,tt]. It is important to note that this conven- 
tion can only decrease the value obtained for the MSE. In 
this work we are concerned with placing lower bounds on 
the MSE. We prove these lower bounds for the MSE with 
the difference defined modulo 2ir. Because this MSE is 
no larger than that obtained without taking the differ- 
ence modulo (— 7T, tt], all results hold for that case as well. 
Thus, it is natural to work with the minimum MSE, given 
by 



(A,$) 2 



<p r —7T 



d(fi {(</>-</>) mod(-7r,7r]| p((fi\(fi) 



d<fi(<fi-(fi) 2 p(<f>\<l>), 



where we have used the fact that p 
2tt. It follows that 



(3) 



repeats modulo 



(4) 



for any reference phase <fi r . 

The above is a measure of the accuracy of the 

phase measurement only for a specific phase shift <fi. It 
is trivial to see that one can always choose a measure- 
ment such that the MSE can be zero for a specific phase 
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shift, (pQ : the trivial measurement that always yields the 
result <f> — fa. In reality, for a phase measurement the 
phase shift is unknown; otherwise a measurement would 
be unnecessary. To be useful, a measurement must give 
accurate results for a range of phase shifts. 

A rigorous way of taking account of the range of phase 
is to average the figure of merit over the phase shift. For 
the MSE one would use 



d4> P m^f 



(5) 



where p(<f>) is a probability distribution describing the 
prior information about the phase shift. In this work we 
consider the case that there is no prior information, so 
p{4>) = 1/27T. Then the average MSE (AMSE) is given 
by 



(<5$) 2 :=±- f #(A^) 2 . 

271" V-Tr 

One then finds that 

(5$) 2 = [ n dee 2 p{9) = (e 2 ) 



(6) 



(7) 



Here p{9) is the probability density for the error in the 
phase estimate O = <t — <I>, and is defined by 



m '-=rJl 



d<f>p(9 + cj)\(j)). 



(8) 



We call 5<t the RAMSE, because it is averaged over 6 
before taking the square root, whereas the RMSE A^Q 
is for a specific <fi. 

Equation ([7]) holds because the mean-square error is 
a linear figure of merit. A general figure of merit for 
the accuracy of a phase estimate <t can be defined as a 
functional, F, that takes as input a probability density 
in 0, and outputs a scalar. In the case that F is linear, 
we find that 



d<t>p(4>)F(p{§\4>)) = F(p(0)). 



(9) 



This means that, for linear measures, the average figure 
of merit and the figure of merit of the average distribution 
are equivalent. 

More generally, consider a convex figure of merit; that 
is, one that satisfies 

F{t Pl {$>) + (1 — t)pa@)) < tf (pi(£)) + (1 - t)Ffa(&)), 

(10) 

for t £ [0, 1]. By using Jensen's inequality, one obtains 



d0p(0)F(p($|$) > F U dMtipWt) 



What this means is that, if the figure of merit is convex, 
then placing a lower bound on the figure of merit for the 



average distribution also provides a lower bound on the 
average of the figure of merit. That is the approach we 
use in this work; we find lower bounds on the figure of 
merit for the average distribution, which also hold for the 
average of the figure of merit. 



B. Holevo variance and average bias 

There are alternative measures of the spread which arc 
similar to the MSE but which are specifically defined for 
phase. These are typically defined in terms of the average 
of the exponential of the phase, (e 1 *). In the case that the 
phase distribution is sharply peaked, then this quantity 
will be close to 1. One possibility for quantifying the 
uncertainty in the phase is 2(1 — |(e l *)|) (2^; another is 

1- l(e'*)| 2 MM- 

A measure of this type with some nice properties is 
that proposed by Holevo (32| . 

W*):H(e%r 2 -l, (12) 

which has been dubbed the Holevo variance [33j . Here 
the subscript </> indicates that the variance is determined 
for a specific value of the phase shift. That is, 



d4e^p{$\4>) 



- 1. 



(13) 



In this case there is no ambiguity in choosing the bounds 
of the integral, because the argument is clearly periodic 
modulo 2n. 

A minor problem with this definition is that it does 
not penalise biased estimates. However, this is easily 
corrected by using the modified definition 

Var ff>( ^ := Re<e^*-<^ 2 - 1. (14) 

If the measurement is "[/(l)-unbiased" , in the sense that 

= arg[(e**) ], (15) 

then these two expressions for the Holevo variance are 
equivalent. 

The Holevo variance is a convex functional of the prob- 
ability distribution. From Eq. (fTTj) , this means that one 
can place a lower bound on the average Holevo variance 
by considering the Holevo variance of the average distri- 
bution. That is, 



(s H *r ■■= (Re(or 2 - 1 



(16) 



is a lower bound on the average value of Varjj i< ^,($). In 
this paper we do not discuss the Holevo variance without 
averaging over </>, so we will refer to (5h $) 2 as the Holevo 
variance. 

In the case that the average distribution p(0) is (7(1)- 



unbiased, in the sense that 



is real and positive, then 



0^$) 2 H(O| 



1. 



(17) 
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If the average distribution is biased, then it can be mod- 
ified to obtain a [/(l)-unbiased measurement. Taking 
(9 av := arg(e ), we can replace measurement operators 



Mi with 



(18) 



Then, for these new measurement operators. p'((j)\(f)) 
p{(f> + 0a, v \4>), so 



2tt 



7T /»7T 



-7T •/ — 7T 

7T PIT 



— IT J — 7T 



d0 / d0e l(<#, - e "-^p(<^|0) 



-i0 av / i6\ 

e av (e )m- 



(19) 



Returning to the asymptotic lower bound (f2"Tj) , its sig- 
nificance is that any lower bound on the Holevo variance 
is also asymptotically a lower bound on the AMSE. In 
particular, it is known that for canonical phase measure- 
ments on a single-mode field there is the tight asymptotic 
lower bound on the Holevo variance <5ff$ > kc/(N) with 
k c := 2(-z A /3) 3/2 « 1.3761 (where z A is the first zero 
of the Airy function) [HI, HH . This is tight in the sense 
that, asymptotically, the Holevo variance is equal to this 
value with any difference being of higher order. Because 
the Holevo variance is asymptotically a lower bound on 
the usual AMSE, we must also asymptotically have the 
lower bound (5$ > fee/ (N) for canonical phase measure- 
ments on a single-mode field. It will be shown in Sec. Mil 
that this is in fact a tight lower bound. 



Hence this modification of the measurement yields a 
£/(l)-unbiased average measurement. 

With this condition, we can bound the mean-square 
error by using the following inequality, 



|(e ie )| = (cos9) > cosx/J® 1 ), 



(20) 



where we have used the fact that cos y/x is a convex 
function, along with Jensen's inequality. There are al- 
ternative ways to bound the mean-square error, but this 
particular inequality will be useful in Appendix [C] It 
also has the nice property that it can be saturated, for 
a probability distribution that is just delta functions at 
±\J (O 2 ). Now consider the limit where the mean-square 
error ((5$) 2 = (6 2 ) is small. Expanding as a Maclaurin 
series in this small parameter, we obtain 



(<5$) 2 > (arccos{[(%$) 2 + 1]~ 1/2 } 

= (5 H $) 2 ~l(5 H $f + 0((S H $) 6 ). 



(21) 



This means that, except for higher-order terms, the 
Holevo variance lower bounds the AMSE from below, so 
asymptotically we have (<5f/$) 2 < (S<t) 2 . 

We can also use the Holevo variance to bound the 
AMSE from above. Using the fact that cos 9 < 1-29 2 /tt 2 
on the interval [— 7r, 7r], we have the inequality 



(cos 6) < 1 



2(e 2 ) 



(22) 



Using this, we have 

(s&) 2 < y (i - [(s H $f + 1}- 1/2 

= ^(5 H $) 2 ^(6 H ^ + 0((S H $f). (23) 

The reason for the factor of tt 2 /4 is that even for small 
variance, the main contribution to the AMSE can be from 
large phase errors. The inequality (|2"2"j) is saturated for a 
distribution that has contributions at ±7r. 



C. Entropic length 

Another measure of concentration is the entropic 
length [3(1 H?} ■ This is given by 



L($) 



(24) 



where H(Q) is the entropy of the error probability den- 
sity, 



H(G) 



p(6)ln(p(9))d9. 



(25) 



The entropy takes its largest positive value for a flat dis- 
tribution, and takes large negative values as the distri- 
bution provides more information about the phase. The 
negative of the entropy provides a measure of how much 
information about the phase is available. The entropic 
length is correspondingly small for a distribution provid- 
ing a lot of information about the phase. 

Similar to the AMSE or the Holevo variance, the en- 
tropic length will be small for a sharply peaked distri- 
bution. However, in contrast to those measures, the en- 
tropic length will also be small if there are multiple sharp 
peaks, with a value roughly equal to the total width of 
those peaks. The entropic length satisfies several basic 
properties expected for a length, discussed in Ref. (HI . It 
can also be used to p rovide a lower bound to the RAMSE 
via the relation [37| 



5$ > (27re)- 1/2 L($). 



(26) 



This is because, if one were considering a distribution 
on the infinite line, the entropy is maximised for fixed 
$ by a Gaussian distribution, in which case (53> = 
{2we)- 1 / 2 L(^>). For the case of phase, we are limited 
to the interval [— 7r, n], which means that the Gaussian 
distribution cannot be obtained exactly. Therefore the 
inequality still holds, but cannot be saturated except 
asymptotically. 

In contrast to the other measures considered here, the 
entropy is not convex. This means that one needs to 
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be cautious when considering the average entropy. The 
entropy of the average distribution docs not provide a 
lower bound on the average of the entropies. We do not 
determine the lower bound on the average of entropies; 
this is an open problem. 



III. OBTAINING UNIVERSAL BOUNDS FROM 
NONDEGENERATE BOUNDS 

We now present the universal form of the Heisenberg 
limit, which was first derived in Ref. (22l |. In subsection 
A we present the theorem showing that bounds which 
hold for canonical measurements on nondegenerate sys- 
tems also hold for completely arbitrary measurements on 
general systems. In optics a single-mode field is nonde- 
generate, whereas the general case includes multimode 
intcrferometry. In subsection B we use this to provide 
our universal form of the Heisenberg limit. In Sec. IV 
we will present numerical results indicating that a better 
scaling constant is possible. 



A. Mapping the general problem to a 
nondegenerate problem 

As discussed at the start of Sec. HH the detection 
method may be described by a POVM {Mi}, which gives 
the probability distribution via 



(27) 



A particularly useful form of POVM is a covariant 
POVM. Whereas for an arbitrary POVM the individ- 
ual can be chosen independently of each other (ex- 
cept for the normalisation requirement), for a covariant 
POVM only one measurement operator may be chosen, 
then all others are related via the generator of shifts. In 
particular, 



M 2 = e-^Mne^. 



(28) 



For a covariant POVM, the probability distribution for 
the error in the estimate is independent of the phase shift. 
This may be shown via 



p(6 + <f>\<f>) = Tr(M e+ ^) 



= T r(e- iG ^M e iG ^e- iG 'f'p o e iG ^ 
Tr(e- iG8 M e lGe p ). 



(29) 



A particular form of covariant POVM is the canonical 
POVM. This can be defined as {e-* G *C e fG *}, with [H] 



C 



1 E 



E 

d n,n'£S;d<D(n),D(n') 



\n,d)(n',d\. (30) 



Here we have labelled the states with n and n' indicating 
the eigenvalues of G, and d the degeneracy. The function 



D(n) gives the degeneracy for eigenvalue n. S denotes 
the spectrum of eigenvalues of G, which we have assumed 
to be the integers or a subset thereof. This definition 
of a canonical POVM is not unique in general, because 
it depends on the labelling of the degenerate states; a 
fact which was not noted in Refs. [22|, [38|, and which 
does not affect the results therein. However, we will only 
require the simpler case of no degeneracies in what fol- 
lows, where for this case the POVM is uniquely given by 
{e- lG(s) <^ s) e* G < 3) *}, with 



o — — 



1 



E 



n)(n'\ 



(31) 



We use G^ to denote a generator with the same spec- 
trum of eigenvalues as G, but nondegenerate. 

We now show that any average phase distribution, 
p(9), can be obtained by a covariant measurement, and 
that the covariant measurement result can be obtained 
by a canonical measurement on a system without degen- 
eracy. In Ref. [iH we obtained this result by a three-step 
process: first that any average phase distribution, p{9), 
can be obtained by a covariant measurement; second 
that the covariant measurement result can be obtained 
by a canonical measurement; and third that the canon- 
ical measurement result can be obtained by a canonical 
measurement on a system without degeneracy. Here we 
simplify the proof by combining the second two steps. 

To express these results it is convenient to modify the 
notation slightly. We will use subscripts on the prob- 
ability p to indicate the POVM used. In addition, we 
will indicate the state used in the probability. In the 
case of the probability for the measurement error 9 for 
the covariant POVM, we omit </>, because the probability 
is independent of <j> as discussed above. Therefore, we 
replace p(8 + <j>\<j>) with p M (9\p ). 

Expressed in terms of this notation, the first result is 
as follows. 

Lemma 1. For any POVM {Md, there exists a covari- 
ant POVM {A/;} such that for all states po, 



Pm(%o) =Pm{8\Po)- 



(32) 



Proof. This result is well known }32j |. but we provide a 
proof here for completeness. Given POVM {M^}, we 
define the covariant POVM via 



M := — 
2tt 



(33) 



Then we find 
p JT (9\ Pa ) = Tr(e- lGe Al e lGe p ) 

= _L r d4>^{e- lGe e lG HUe- lG *e lGe P() ) 

27! " J-TT 

d<j> PM (4>\4>-9) 



i 

2^ 



2- 



d<f>p M {<t> + e\<f>)=PM(0\po). (34) 
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In the last line we have shifted the variable of integration. 
This shows the relation (f32jl required. □ 



The second result, which is a combination of the two 
steps given in Ref. [22| . is as follows. 

Lemma 2. Given any covariant POVM {M ^} and state 

( s) 

po, there exists a state without degeneracies p such that 

the probability distribution of G' s ' for p^ is the same as 
that of G for po, and 



Pcw(9\Po) =Pw(°\Po)- 



(35) 



Proof. Choose the state without degeneracies via 
P^^tt J2 l n ')H E (n',d'\p \n,d) 

n,n'£S d<D(n),d'<D(n') 
J J' 



x {n,d\M \n',d r 



(36) 



(s) 

Note that p Q is indeed a density operator, and yields the 
same distribution for G^ as po does for G. The details 
for how to prove these facts are given in Appendix [A"l 

It can be shown that the average phase distributions 
are also the same, via 



Tr(e- iGls)e C { s) e iGMe p { s) ) 



2?r 



J2 \m)(m'\e lG{S>e 27r £ \n')(n\ ]T (n', d'\po\n, d) x (n, d\Mo\n', d') 

m.m'eS n,n'eS d<D{n),d'<D{n') 

J- e i(»'-n)fl ^ (n',d'|po|n,d)(n,d|M |n',d') 

n.n'GS d<D(n),d'<£)(n') 

5^ (7i',d'|poh,d)(n,d|e~ lGf, M e 4G V^'> 

n.n'eS d<D(n),d'<D(n') 

]T E ("^rfVoe- ^Ge Moe ^Ge |n^dO=Tr(e- ^Ge Moe ^G Vo)=PM( IPo). (37) 

n'eSd'<D(n') 



r 



This shows the relation (|35p required 



□ 



Using these lemmas then enables us to prove our theo- 
rem that the average distribution can always be obtained 
by a canonical measurement on a system without degen- 
eracies. 

Theorem 1. Any bound on the concentration of the 
canonical phase distribution of a nondegenerate system 
with shift generator , under some constraint C on 
the distribution of G^ , is also a bound on the concen- 
tration of the average phase distribution p(9) of an ar- 
bitrary phase estimate for any shift generator G having 
the same eigenvalue spectrum as G^ s \ providing that the 
probe state satisfies the same constraint C with respect to 
the distribution of G. 

Remarks. A measure of the concentration is a func- 
tional of the probability distribution, and includes the 
mean-square error, the Holevo variance, and the en- 
tropic length. For measures that are convex, such as the 
mean-square error and Holevo variance, lower bounds on 
the measure for the average distribution provide lower 
bounds on the average of that measure (see Sec. Ill All . 
By the distribution of G, we mean the probability distri- 
bution for the eigenvalues of G. Examples of constraints 
on the distribution of G are a fixed mean (G) , an upper 
bound on the eigenvalues, or a fixed mean absolute value 



Proof. Consider any state po that satisfies the constraint 
C on the distribution of G. Given an arbitrary measure- 
ment described by a POVM {M^}, we obtain an average 
phase distribution p(0). Using Lemma U we find that 
there exists a covariant POVM {M?} such that the same 
probability distribution is obtained with the same state, 
po- Next, using Lemma [21 there exists a state without 

(s) 

degeneracy, p , such that the nondegenerate canonical 

( s) 

measurement on p Q produces the same phase distribu- 
tion, and the distribution of C?W is the same as the dis- 
tribution of G for po- 



Therefore, the distribution of G^ for p still sat- 
isfies the same constraint C. Furthermore, because the 
probability distribution for the canonical measurement 

PcW (^IPo^) * s cc L ua l to the average phase distribution 
pAi{0\po), any measure of the concentration of the prob- 
ability distribution is unchanged. Because any value of 
the concentration that can be obtained for the arbitrary 
measurement under constraint C can also be obtained 
for the concentration of the canonical phase distribution 
under the same constraint, the arbitrary measurement 
must satisfy the same bound as the canonical measure- 
ment. □ 
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B. Analytic bounds via an entropic uncertainty 
relation 



It is possible to obtain a number of bounds by using 
entropic uncertainty relations. The entropic uncertainty 
relation for canonical phase measurements and a nondc- 
generate shift generator G is given by [111 H(| 



H{Q) + H(G) > ln27T. 



(38) 



This can then be used to obtain bounds on the RAMSE 
0. In particular, combining Eqs. ([23)). ([26) and (|5gj) 
yields 



8$ > {2irey 1/2 e H{ ^ > ^Tr/e) 1 /^"^. 



(39) 



We first specialise to the case where the eigenvalue 
spectrum S includes all nonncgative integers, so we de- 
note the generator by N. The entropy for fixed mean 
number is maximised for the thermal (negative exponen- 
tial) distribution. By a straightforward calculation, one 
can show that this results in the inequality 

H(N) < \n(N + 1) + (N) ln(l + 1/(N)). (40) 

Because xln(l + 1/x) < 1, this yields (for finite expecta- 
tion values) 



H(N) < ln(N + 1) + 1. 
Substitution into Eq. (f3^|) then gives 

kA 



8<P > 



(N + l)' 



(41) 



(42) 



where fc^ = y/2ir/e 3 « 0.5593 (defined in the Introduc- 
tion). Using Theorem[[J this result holds for the RAMSE 
for all possible phase measurements, and for any shift 
generator with nonnegative integer eigenvalues. Recall 
that, because the MSE is a linear measure, the RMSE of 
the average distribution is equivalent to the RAMSE [see 
Eqs. © and ©]. 

We can also use this result to infer the result in the 
more general case where there is some lower bound g on 
the eigenvalues of G. Then we can take G = gl + N, so 
(N) = (G — g). Then one obtains 



8<S> > 



kA 



(G - g + 1) ' 



(43) 



Note that, for this result, it is not necessary for the spec- 
trum S to include all integers above g. This is because, 
in minimising 8$ for given {G — g), removing some inte- 
gers restricts the possible states, and therefore can only 
increase the AMSE. 

An alternative restriction that one may wish to con- 
sider is, instead of a fixed mean, a fixed mean of the 
absolute value, (|G|). This is of particular interest in the 
case of angular momentum, where G = J. Then fixed 
(| J|) corresponds to a mean absolute value of the angu- 
lar momentum. The maximum entropy for fixed (\G—g\), 



where g is any real number, can be obtained by finding 
a critical point of the variational quantity 

A = - ^Pn Inpn - a ^ p n - f3 ^ \n - g\p n , (44) 



where a and /3 are variational parameters. As shown in 
Appendix IBl this yields 



H(G) < ln(2(|(7 — g\) + 1) + 1. 
Substitution in Eq. (|39|) then gives 

kA 



8$ > 



[2|G-0| + 1) 



(45) 



(46) 



Once again we note that this result holds both when S 
includes all integers, so G — J, and when S does not 
include all integers. In the latter cases, the maximum 
entropy distribution can not be obtained exactly, but it 
still provides a bound. 

For a given state, one can adjust the value of g in 
order to maximise this lower bound. The optimal value 
is the median; that is, the value such that there is equal 
probability for eigenvalues above and below g. 

Another restriction on the distribution that can be con- 
sidered is a finite range of eigenvalues. For example, with 
number we have a minimum eigenvalue of 0, and can 
place an upper bound of n max on the eigenvalues. Then 
the entropy is bound as 



H(G) < ln(rw + 1), 



(47) 



because the maximum entropy is for the flat distribution. 
Then, combining with (|39p gives 



5$ > 



(48) 



In the specific case of the Holevo variance, there is a 
well-known result for canonical measurements [33L 14111 , 



8h® > tan 



(49) 



This result is achievable for arbitrary n max . Using our 
Theorem, this result also holds for the average distribu- 
tion for arbitrary measurements. Furthermore, because 
the Holevo variance is a convex functional of the prob- 
ability distribution, this bound holds for the root-mean 
value of the Holevo variance (averaging over phase shifts). 



IV. OPTIMAL BOUNDS VIA NUMERICAL 
CALCULATIONS 

The bound in Eq. (|4"2"]) has a scaling constant of 
kA = -\/27r/e 3 ps 0.5593. In contrast, based on the 
asymptotic result for Holevo variance (3~H . |35| . we ex- 
pect <5$ > kc/{N) with k c = 2(-z A /3) 3 / 2 « 1.3761 for 
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large (N), where za is defined in Sec. Ill Bl This indicates 
that the scaling constant of the bound in Eq. (|42|) is not 
optimal, and suggests the conjecture [22| 



kc 



{N + iy 



(50) 



In order to test this conjecture, we solved the variational 
problem to find the minimum value of the RAMSE or 
Holevo variance as a function of (N). The results sup- 
porting this conjecture are given in this section. In SeclVI 
the conjecture is proved analytically for the special case 
of the asymptotic limit (N) — > oo. 



A. Holevo variance 

The case of the Holevo variance is simplest, because 
the problem is to maximise |(e 40 )|. Given a state 



we have 



Ke*>| 



n=0 



n=0 



(51) 



Note that we can upper bound this expression via 



n=0 



(53) 



n=0 



The normalisation and (N) are unaffected by replacing 
the coefficients tp n with their absolute values. Therefore, 
for maximisation of |(e )|, we can always take ip n to be 
real and nonnegative. 

From the above, the variational problem is thus to find 
a critical point of 



A = (V'nV'n+i - aifl - (3nipl) , 



(54) 



where a and /3 correspond to normalisation and mean 
photon number constraints. The variational condition 
d A / dip„ = leads directly to the eigenvalue equation 



tpn-l + ipn+i = 2(a + (3n)ip r , 



(55) 



for n > 1, and ip\ = 2{a + (3n)tpQ. To avoid the need 
to specify a different equation for n = 0, we can simply 
define := 0. 



B. Root-mean-square error 

The problem for minimising the RAMSE is somewhat 
more difficult, because we do not have a simple expression 



like Eq. (|52[). However, any well-behaved function (i.e., 
satisfying the Dirichlet conditions) can be expanded in a 
Fourier series on the interval [—%, tt] as 



(56) 



For m > 0, the expectation values of the exponentials 
are given by 



Jm6\ 



n=0 



(57) 



For m < 0, the expectation values are just the complex 
conjugate of those for positive m. 

Unlike the case of the Holevo variance, it is not obvious 
at first sight that we can take the state coefficients to be 
real. However, if f(8) is real, and symmetric about 6 = 0, 
then Z- m = z* n = z m . Therefore the expectation value 
of /(O) is given by 



(/(e)) = E r m z mn ^n (58) 

m,n— 

where Z is the real symmetric matrix with coefficients 
The variational problem is then to find a critical point 



(52) of 



A = (/(6)) - a - p(N) 



m,n— 

The variational condition leads to 



(59) 



(60) 



n=0 



This equation is solved as an eigenvalue equation with 
a as the eigenvalue. Because the corresponding matrix 
Z — j3N is real and symmetric in the number state basis, 
the eigenvectors are real in this basis (up to a global 
phase factor). This means that the state coefficients can 
indeed be taken to be real. 

In the specific case of f(6) — 8 2 , the Fourier series is 



2 °° f_i\m 

m—l 



/??- 



We then obtain the eigenvalue equation 
- 2 00 2(-l) n 



' n=—m 



(61) 



(62) 



Numerical solution of this eigenvalue equation is difficult, 
because there are an infinite number of Fourier coeffi- 
cients. The problem can be truncated at some maximum 
number, but solution still requires finding the eigenvalues 
of a full matrix. In contrast, the problem for the Holevo 
variance is sparse, and can therefore be solved much more 
efficiently. 
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C. Bounding the AMSE 

As we are interested in testing a lower bound on the 
AMSE, we can alternatively use an expression with a 
finite number of Fourier coefficients, but that forms a 
lower bound on 8 2 . One alternative is to use fi(9) := 
2(1 — cos#), which is the same optimisation problem as 
for the Holevo variance. To show f\(8) < 8 2 on [— %, 7r], 
we can use a Taylor expansion to third order with the 
Lagrange form of the remainder 

m = 8 2 + i^-8 3 = 8 2 - lysine, (63) 

where £ € [0,8]. Because sin£ has the same sign as 8 3 , 
the remainder term is negative, and fi(8) < Q 2 . 

The drawback to this alternative is that it yields results 
that do not satisfy the conjectured lower bound. We 
therefore use a higher-order approximation given by 

h{6) - \-\ co * e + \ cos2d - ( 64 ) 
Again expanding in a Taylor series, 

h{6) = e 2 + l£^±8 3 = 8 2 - 1*3(1 - cosO sinC 

< 8 2 . (65) 

We can also obtain an upper bound using (sec Appendix 
E} 

/ 3 (0) (tt 2 /4- 1) [2(1 - cos 8) - (1 - cos26>)/2] 

+ 2(1 -cos 8). (66) 

In the following we will use (8 m &) 2 := (/m(@)) f° r m S 
{1,2,3}. 

D. Numerical results 

The minimal Holevo variance, as well as the minimal 
values of (O 2 ) and 62$, have been determined by nu- 
merically solving the eigenvalue equations. In each case, 
a number cutoff was used that was about 10 times the 
value of (N), or 100 for small (N). At this point the 
magnitude of the state coefficients had fallen to less than 
1/10 6 of the maximum value, and increasing the cutoff 
beyond this did not alter the results by more than 1 part 
in 10 6 . For the results for (0 2 ), the maximum (TV) was 
about 5000, due to the difficulty in finding eigenvalues of 
a full matrix. In contrast, for the Holevo variance and 
for 62$, the maximum (TV) was over 10 6 . 

The results for the Holevo variance are given in Fig. [1] 
In this figure the square root of the Holevo variance is 
plotted multiplied by (TV + 1). Therefore, if k c /(N + 1) 
provides a lower bound to the RAMSE, the curve should 
be above kc (also shown in the figures). It is clear 
from the figure that the numerical results indicate that 




FIG. 1: Minimum possible value of (TV + 1)<5h$, plotted as 
a function of (TV) (solid curve). The case where <5i<3> is used 
instead of Sh& is shown as the dashed curve (green). The 
asymptotic value of kc ~ 1.3761 is shown as the horizontal 
dotted line (blue). 
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FIG. 2: Minimum possible value of {N + 1)<5<1>, plotted as a 
function of (N). The dotted curve (green) shows the values 
obtained for <5$ (i.e., the RAMSE). The solid curve (black) 
uses 82$ instead of 5$. The dash-dotted curve (red) is the 
lower bound using arccos [1 - [5A) 2 /2]. The dashed curve 
(blue) is the upper bound using S3& calculated for the state 
that minimises The asymptotic value of kc ~ 1.3761 is 

again shown as the horizontal dotted line (blue). 

kc I (N + 1) provides a strict lower bound to Sh®- In this 
figure $1$ is also shown, and 61& < kc/(N + 1) in the 
range shown. 

The results calculated for <5$ are shown in Fig. [21 It 
can be seen that these results are also above the line for 
kc, indicating that <5$ > kc/(N + 1). One would like 
to provide more easily calculated lower bounds on <5$ to 
test this inequality more thoroughly. It is clear that 5±& 
is not useful for this purpose, because the curve in Fig. [1] 
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is below kc- It is also possible to obtain a tighter lower 
bound on (5$ using ($1$ (sec Appendix [C|l . but this curve 
is still not above kc for all (N). 

A better lower bound to 5& is 62$, which is also shown 
in Fig. [H and is above the kc line. This quantity can be 
calculated more rapidly and reliably than S<t, and results 
are given up to (N) w 2 x 10 6 . This provides further 
numerical evidence that S<t > kc/(N + 1). Results were 
also calculated for (N) down to about 10~ 6 . These are 
not shown in the figures, but the curves that are above 
kc do not cross below kc- 



E. Angular momentum calculations 

We have also calculated the corresponding results with 
a fixed value of (|J|). The variational problem is exactly 
the same as before, except now we sum over positive and 
negative values of j (as opposed to n), and replace n with 
\j\. That is, the variational problem is to find a critical 
point of 

A =</(©)> (67) 

As before, for a real function / symmetric about zero 
we can assume that the state coefficients are real, so the 
variational condition yields 
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FIG. 3: Minimum possible value of {2\J\ + 1)5$ is plotted as 
a function of (JV) as the dotted curve (green). The minimum 
value of (2|J| + 1)<5h$ is shown as the solid curve (black), 
and (2\J\ + l)Ji$ is shown as the dashed curve (red). Fhc 
asymptotic value of k' c w 0.7916 is shown as the horizontal 
dotted line (blue). 



V. ASYMPTOTIC EXPANSIONS 
A. Holevo variance 



E 



a m ip J+m = (a + /3j)tpj. 



(68) 



In the case of f(8) = 8 2 , we obtain the eigenvalue equa- 
tion 



9 kJU 

+ E 



2(-l) m 

-^-j-il>i+ m = {a + p\j\)4>j- (69) 



m— — 00 



The eigenvalue equation for the case of fi(0) is 
1>3-i+1>j+i =2(a + P\j\)il)j. 



(70) 



We will not consider f2(0) for this problem. 

The results for (5$, 5h®, and 5i§ were all determined 
numerically, and the results are shown in Fig. [3] It will 
be shown in the next section that the asymptotic optimal 
value for <5i$ is 



(2|J| + 1) 



(71) 



with k' c = A{-z' A /S) 



.3/2 



0.7916, where z' A is the first 



zero of the derivative of the Airy function. We have there- 
fore plotted the results for multiplied by (2\J\ + 1) in 
Fig. [3J It can be seen in this figure that all the results are 
above k' c , supporting the conjecture that there is strict 
inequality with the scaling constant k' c . 



In the specific case of the Holevo variance, it is pos- 
sible to obtain analytic results in terms of Bcssel func- 
tions to provide further support to the conjecture that 
<Jjj<l > kc/(N + 1). The recurrence relation (l55l) has a 
known solution in terms of Bcssel functions [35| . Besscl 
functions of the first kind satisfy the recurrence relation 
Jk-i(z) + Jk+i{z) = (2k/ z)Jk{z). Therefore the solution 
is of the form 



ip n (x, z) = AJ x+n+ i(z), 



(72) 



with x := a/(3 — 1, z :— 1/(3- Besscl functions of the 
second kind can be ignored, because they diverge for large 
values of the order. The condition that -0-1 = implies 
the restriction 



Jx{z) = 



(73) 



on the parameter z, thus confining its allowed values to 
the (countable) set of zeroes of J x . 

To obtain the smallest Holevo variance for a given 
mean photon number, we wish to take the solution for 
the largest value of a. This corresponds to the largest 
solution of Eq. (|73|) in terms of x for given z. Conversely, 
for given x we want the first positive zero of J x . The 
normalisation constraint yields 



A" 



n=0 



[Jx+n+l{z)f 



fe=l 



(74) 
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and hence one has 

OO 

12 \ " 



(N) = A n [Jx+n+l{z)} = A 
n=0 

_12kLl k l J x+k{z)} 2 . 



oo 
k=l 



oo 

+n+l{ z ) Jx+n+l{z) 

n=Q 

J2kLl Jx+k(z) J x+ k+l{z) 



(k - l)[J x+h {z)] 2 
(75) 



(76) 



EZAJx + k(z)] 2 ' 

Using Eq. (|55p . we have 

(e i6 ) = (a + (3(N)) = (x + (N) + l)/z. (77) 

Up until this point, these results for the Bessel functions 
are the same as those of Ref. [35j]. Reference [HI then 
uses an approximation in terms of Airy functions. Wc 
have determined more accurate results using formulae 
for sums of Bcsscl functions (see Appendix [Dj . We find 
that 



|(e ie )r 2 -l = V — 



o 



1 



(N + l) 



12 



. (78) 



B. Upper bounding the optimal mean-square error 

It would be desirable to obtain a similar approxima- 
tion for the exact RAMSE (5$. However, the eigenvalue 
equation does not have any solution in terms of elemen- 
tary functions that we have been able to find. Even the 
lower bounding quantity 62^ yields an eigenvalue equa- 
tion that does not appear to have an analytic solution. 
However, we can place an upper bound on the optimal 
value of 5$, using 

2 < f 3 (0), (79) 

for 9 e [— 7r,7r]. Wc can calculate 5^, except for the 
state that minimises <5i$. This value is shown in Fig. [2] 
for comparison with 5$. 

Using the properties of Bessel functions, this leads to 
the result that the optimal value of S<t satisfies (see Ap- 
pendix [D]) 



(<5$) 2 < 



k c 



(N+iy 



o 



1 



(N + l)- 



(80) 



This means that, asymptotically, the optimal value of <5$ 
cannot be larger than kc / (N + 1). Because S<t cannot 
be smaller than Sh& except for higher-order terms [see 
Eq. (|2ip], this means the optimal <5$ must be asymptot- 
ically equal to kc/ '(N + 1) [i.e., kc is the largest value of 
k for which Eq. ([T]) can be true]. 



C. Angular momentum calculations 

Next we consider the problem with fixed (| J|). Recall 
that the variational problem yields an eigenvalue problem 
given in Eq. (|7D|) . This is solved by taking [42| 



for j > 0, and 



i>j(x,z) = Ax J x+j (z), 



ipj{x,z) = A 2 J x -j(z), 



(81) 



(82) 



for j < 0. In this case we take x := ot/P, z := l/f3. Wc 
again may ignore Bessel functions of the second kind, 
because they diverge. The restriction that the solutions 
coincide for n = means that A\ = A2 = A, and 



i>j{x,z) = AJ x+ \j\{z), 



(83) 



for all j. The condition that the recurrence relation holds 
for j = means that 

J x+1 (z) = - J x (z) = ~[J x -i(z) + J x +i(z)}. (84) 



where 62 = k c , and 62 to 610 are all positive and close This implies 
to 2. The fact that each bj that has been calculated is 
positive strongly supports the conjecture that the Holcvo 
variance is strictly lower bounded by the first term. 



J x -i(z) - J x +i(z) = 0. 



(85) 



Then, using [J K _i(z) — J x+ i(z)]/2 = J' x {z), this means 
we must have J' x {z) = 0. 

Performing series expansions similar to that for the 
first case, gives (see Appendix |E| 



2(l-|(e ie )l) = E 



fe=2 



(2|J| + 1) 



O 



(2|J| + 1) 



(86) 

where d,2 = k' c , and coefficients up to (I5 are positive, 
but de is negative. This strongly supports the numerical 
results that the strict inequality 



5i$ > 



(2|J| + 1) 



(87) 



holds. In turn, because 5$ > 5i&, this also supports the 
conjecture that the inequality holds for <5<3&. In addition, 
5h& > 5i<t, so this supports the conjecture that the 
inequality holds for Sh<&- 

Similarly to the case for fixed (N) , one can use Eq. ([75]) 
to find a series expansion for an upper bound on the 
optimal value of (5$, giving 



(<5$) 2 < 



(2[J| + 1> : 



O 



1 



(2|J| + 1)3 



(88) 



This means that we have upper and lower bounds on the 
optimal (5$, showing that it is asymptotically equal to 
k' c /(2\J\ + l). 
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VI. SCALING WITH NUMBER OF PROBE 
STATES 

Another question is what the scaling of the lower 
bound is if there are m identical probe states. Nor- 
mally it is expected that the MSE will scale like l/-y/m 
if there are to copies of the state. This is because, for 
estimates formed by the average of the individual esti- 
mates, the standard error scales as 1/y/m. Similarly, the 
Cramer-Rao bound for to identical probe states yields 
the following bound for estimates that are unbiased (in 
the standard statistical sense, not what we have called 
[/(l)-unbiased in SccEU) Hi EH: 



A $ 



> 



1 



2^/mAN 



(89) 



We will call this the Helstrom-Holevo bound. 

Because of these results one might expect that one 
could derive a lower bound to the uncertainty in terms 
of (N) of the form k/(y/m(N + 1)). On the other hand, 
directly using the above methods yields a lower bound of 

> 1 — aT — TV) (90) 

because the overall average number is m(N). Recall that 
we have proven this inequality for k = k&, and have 
extremely strong numerical evidence for the inequality 
for k = kc- 

We can prove that there is no lower bound scaling as 
l/(y/m(N + 1)) in the following way. Let to e N, (N), 
and 8 > be given. We use fi for the required value of 
(N), to avoid confusion with intermediate states we use 
in this discussion with different values of (N) . 

Let be the state with the minimum phase un- 

certainty for mean number (TV) = n — 1, and let l) 
be the corresponding state with the same amplitudes, 
but shifted up by one. This means that there is no 
vacuum component, the phase uncertainty is unchanged, 
and (N) = n. We are considering small 5 and large 
to, so we expect that fj, s < to. In that case, we take 
n = (mfj,) 1 ^ 1+s \ and consider to copies of 



|V>) = y/l - fi/7 



(91) 



For this state, (N) = (J,. Now consider a phase measure- 
ment that first distinguishes between |0) and |x'i-i) 011 
all copies of the state. If the |x'i-i) result is found, then 
a canonical phase measurement is performed. 

The probability of getting the |Xn-i) result is fi/n. For 
to repetitions, the probability of projecting every single 
copy onto the state |0) is (1 — fi/n) m < cxp(— to/^/tt,) = 
cxp(— (mfi) s ^ 1+s ^). This probability scales exponentially 
in to/x and may be ignored for asymptotically large to/z. 
The phase uncertainty is therefore (up to an exponen- 
tially small correction) no more than that for |x«— i)j 
which is 

<J$ = k c /n + 0{l/n 2 ) 

= fc c /(TO M ) 1 / (1+rf) + 0(1/(to/z) 2 /( 1+ ^). (92) 



For fi 5 > to, we can just take n = fi, and — \x' n -i)- 
In this case we have 1/(to/x) 1 ^ 1+ ' 5 ' 1 > 1/fJ,. Therefore, 
considering just the uncertainty for a single copy of the 
state gives 



<5<l = fcc//i + 0(l/M 2 ) 

< k c /{mn) 1I( - 1+5) +0(l/(TO At ) 2 / (1+5) ; 



(93) 



This provides an upper bound to the uncertainty for m 
copies of the state. 

Therefore, we find that, for any 8 > 0, to € N and fi = 
(N), we can find a state such that the uncertainty is no 
greater than kc / '(m/x) 1 ' to leading order. Because 
we can choose any 8 > 0, this means that, for fixed (N), 
the lower bound to the scaling must be arbitrarily close 
to 1/to. 

This result is counterintuitive, because for a state that 
does not depend on to, the uncertainty can be expected to 
scale as l/^/ro, similarly to the Hclstrom-Holcvo bound 
(JSHJ). However, the Helstom-Holevo bound, in terms of 
m and AN, holds even for states that depend on to. 
Similarly, a bound in terms of m and (N) must hold for 
states that are chosen based on m. We have shown that 
the potential dependence of states upon to means that it 
is not possible to obtain a universal bound that scales as 
1/y/m for given (N). 



VII. PAPERS CLAIMING VIOLATION OF 
HEISENBERG LIMIT 

In the following, we present some recent measurement 
schemes claiming violation of the Heiscnberg limit. We 
summarise the techniques used in these schemes, and ex- 
plain why they appear to violate the Heiscnberg limit. 
We argue that the accuracy of these super-Heisenberg 
measurements should be considered illusory, primarily 
because they only work for a very restricted range of 
phase. 



A. Anisimov et al. 

Anisimov et al. [l2[ describe a noncovariant phase es- 
timation method having a minimum RMSE 



1 



[(N)((N)+2)]W 



(94) 



This quantity is for a particular phase shift, as opposed 
to the average over the phase shift, (5$. Also, the RMSE 
is here using a reference phase of 0, rather than the ref- 
erence phase of <j> that we use (see Sec. II A). This result 
violates an alternative definition of the Heisenbcrg limit, 
given by Anisimov et al. as [l2T ] 



A°$ > 1/(N). 



(95) 



First, it should be noted that this does not give a different 
power of (N), and does not change the scaling constant. 
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It only violates this form of the Heisenbcrg limit by an 
amount which is significant for small (N) , and is of higher 
order for large (N) . In later work [3] , they have modified 
their claim to that of achieving the Heisenberg limit. 

In fact, it is easy to see that the above form ([95]) of the 
Heisenbcrg limit cannot be a strict limit for small (N). 
For any (N) less than 1/ir it must be violated, because 
the maximum RMSE possible is ir 2 . For the same reason, 
any bound of the form k/(N) cannot hold for all (N). It 
is for this reason that we have used (N + 1) (or (G + 1) 
more generally). 

Note from Eq. ([M| that the minimum RMSE satisfies 



1 



1 



> 



[(N)((N)+2) + l]V 2 (JV + 1)' 



(96) 



Hence, the RAMSE, S<t, trivially satisfies our analytical 
lower bound (|42[) . However, the minimum value is below 
our conjectured best possible bound (|50|) . by a factor of 
kc ~ 1.3761 in the asymptotic limit. This does not con- 
tradict our conjectured bound, because the conjectured 
bound is for whereas the above value is for a spe- 
cific value of the phase shift. This is an important aspect 
of our result. It is possible to obtain smaller errors for 
specific values of the phase shift (23T427T ] . but not when 
the average is taken over the phase. That is, the nonco- 
variance of the scheme in Ref. [l2[ does allow beating our 
conjectured bound, in a small range of phase shifts about 
= 0, but only at the expense of worse phase resolution 
over the remainder of possible phase shift values. 



B. Zhang et al. 

Zhang et al. [HI propose a superposition state with 
arbitrarily high phase sensitivity but finite average pho- 
ton number. They consider a two-mode Mach-Zchndcr 
interferometer (MZI) system, with a probe state of the 
form 



^2 c n \ip n ), 



n>l 



|V„) :=- 7 =[|n,0) + |0,n)]. 



(97) 



That is, is a superposition of (mutually orthogonal) 
NOON states. 

They use the quantum Cramer-Rao bound (QCRB) 
to derive the ultimate limit to the uncertainty of phase 
measurement as 



1 



(98) 



in contrast to Eq. ([9"5]). Also N = N a + N b is the to- 
tal photon number operator for the two modes a and b, 
rather than just the number operator N a for the mode 
passing through the phase shift. They call Eq. (j9"5)) the 
"proper" Heisenbcrg limit, and Eq. (|9"5|) the "generally 



accepted form" of the Heisenbcrg limit. This result is 
similar to the result given in a number of other works 
f4f|[46|]. By choosing c„ oc n~ 3 / 2 , they obtain (N) < oo 
and (N 2 ) = oo, which gives A° $ > 0. They further claim 
in Sec. V of Ref. [l3| that this lower bound is achievable 
(i.e., that the uncertainty can be zero for finite (N)). 

An interesting feature of their result is that the Fisher 
information can be infinite for finite (N). Therefore, it 
should not be expected that the QCRB can give a non- 
trivial lower bound on the uncertainty for fixed (TV) . Fur- 
thermore, the Fisher information is infinite for all <j). 

However, there are some problems with the result pre- 
sented. First, they give no proof that the lower bound 
provided by the QCRB is achievable. In many cases 
Fisher's theorem 47J allows the QCRB to be achieved 
asymptotically (i.e., with a scaling constant of for 
m probe states). However, Fisher's theorem is not uni- 
versally applicable, because it requires a unique maxi- 
mally likely estimate [48[. In contrast, here the measure- 
ments will yield multiple maximally likely estimates. 

Second, the form of the QCRB given is for unbiased 
measurements, but it is unclear how to perform an un- 
biased measurement here. For biased measurements this 
lower bound does not hold. In fact, the obvious measure- 
ment technique is biased, and will only yield zero error 
for <f> = and tt, similar to the example in Sec. IVHI CI 
This can be achieved with a very simple choice of state. 

However, measurements that yield zero error only for 
isolated values of 4> will not be useful. Further, based 
on the results presented here, the average performance 
of any two-mode MZI estimate must satisfy 



<5$ > 



kA 



(N a + 1) 

as a consequence of Eq. (jl2")h 



(99) 



C. Rivas and Luis 

Rivas and Luis [l4| consider a linear phase estimation 
procedure that employs as the probe state the coherent 
superposition 



IV;) = n\o) + HO 



(100) 



of the vacuum |0) and a squeezed state The authors 
consider the case with v -C 1 , fj, ~ 1 and also assume that 
the phase shift is known to be small: <j> <C 1. The fixed 
mean photon number of the probe state is then given by 



(N) 



v 2 n^ 



(101) 



where is the (average) number of photons in the 
squeezed state. Using conventional error propagation ar- 
guments, they find for this state 



(A°d) 2 > 



Am(N)< 



(102) 
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where m is the number of repetitions of the measure- 
ment. The lower bound here is arbitrarily below the usual 
Heisenbcrg limit by a factor 0(v 2 ). 

We note that similar results can be obtained in a sim- 
plified scenario by employing the probe state 



M=/z|0) + i/|fi c ) 



(103) 



where \n^) denotes a number state with photons. 
The interference fringes obtained from this state are 
high frequency, but low visibility. A calculation using 
the error propagation formula based on the observable 
X = |0)(%| + |n ? )(0| yields 



(A°$) 2 



(5Xf 



\d{X)/d(t)\ 2 4(JV) : 



Taking v oc (N) p gives 



(N)p' 



(104) 



(105) 



which, in principle, gives an accuracy that scales arbi- 
trarily well with (N) (for large p). 

The problem with this scheme is that the high accuracy 
predicted by the error propagation formula is given by 
high frequency fringes with low visibility. It would take 
a great deal of additional phase information to resolve 
the ambiguity in the fringes, as well as many repetitions 
of the measurement to obtain a reasonable estimate of 
the observable X so that the error propagation formula 
would become accurate. 

The scheme presented in Ref. [3] is a little more com- 
plicated (including an analysis of the efficiency) , but sim- 
ilar considerations apply. A quadrature measurement is 
considered for a fixed phase, which means that the anal- 
ysis essentially gives an estimate of the uncertainty for a 
given value of the phase. As we have noted above, it is 
possible to obtain higher accuracy for a particular value 
of the phase shift. For example, it is trivial to design a 
measurement that gives zero error for a single value of the 
phase shift. The bound (f42|) must hold when averaging 
over the phase shift. 



D. Nonlinear interferometry 

A qualitatively different type of proposal for beating 
the Heisenberg limit is that based on nonlinear interfer- 
ometry [IH, [l6[ . The basis of these proposals is that the 
generator of the phase shifts is nonlinear in the num- 
ber operator. For example, G = N q for some q > 1. 
It is then found that the phase uncertainty can scale as 
l/(N) q . Subtleties involved in achieving such scalings 
are discussed in Ref. (27j . 

These proposals do not contradict the results presented 
here; they are .just using the terminology differently Q. 
In Rcfs. [Ha, Hali the Heisenberg limit is given as 1/{N), 
where N is the number of particles. In contrast, here we 



give the Heisenberg limit in terms of the generator of the 
phase shifts. That is, the bound is 



<5$ > 



k 



(G + 1) (Ni + 1) ' 



(106) 



which typically scales as k/(N) q . Therefore the results 
do not violate the Heisenberg limit (| 106|) given here. In 
Refs. [lj| [lH, they call this limit the "quantum limit", 
rather than the Heisenberg limit. 



VIII. LIMITATIONS OF THE CRAMER-RAO 
BOUND 

The Cramer- Rao bound for the RMSE A^<£ is often 
used as motivation for the Heisenberg limit, but it has 
limitations which mean that it does not provide a rigor- 
ous basis for the Heisenberg limit. There are a number 
of different variations of the way the Cramer- Rao bound 
is used. First, the classical Cramer- Rao bound (CRB), 



1 / yjmFc (</>), is in terms of the classical Fisher informa- 
tion Fc{4>) of a specific probability distribution, so in 
quantum mechanics it is calculated for a given state and 
measurement. 

Second, the quantum Cramer-Rao bound (QCRB) re- 
places Fq (</>) by the quantum Fisher information, Fq (4>) 
(corresponding to the classical Fisher information opti- 
mised over all quantum measurements), but is still calcu- 
lated for a given state [49|. Third, the Helstrom-Holevo 
bound (HHB), as in Eq. (|89|) . is optimised over both 
the quantum measurement and the quantum state, with 
the optimisation being for a given AN. Because these 
bounds use successively more optimisation, one has the 
ordering CRB > QCRB > HHB. In particular, for any 
estimate that is unbiased for phase shift <fi, one has 



A°A > 



1 



> 



1 



> 



1 



2.MAN 



(107) 



The most obvious limitation in using the HHB is that 
it is a limit in terms of AN, whereas the Heisenberg limit 
is in terms of (N). This means that, for states with large 
uncertainty in N as compared to the mean value, the 
HHB does not imply the Heisenberg limit. This is taken 
advantage of in Refs. [l3l [l4|. 

A fixed value of A7V is just a choice of constraint. 
One could also consider optimisation for fixed (A), as 
a method to obtain the Heisenberg limit. However, it is 
easily seen that there is no upper bound on the Fisher 
information for a given (N). In the example of Zhang 
et at, they find a state with infinite Fisher information 
for finite (N) (see Sec. VII B). Note also that if there 
were such a bound, then the CRB would imply a 1/y/m 
scaling for fixed (N), whereas we have found that such 
scaling is impossible (see Sec. IVI[) . The difficulty of using 
the CRB was also noted in Ref. 1281. 
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A. Bias in phase estimation 

Another major factor that needs to be taken into ac- 
count when considering the CRB and related bounds is 
that of bias. Note, for example, that the value of AN 
in Eq. (|107[) can be arbitrarily small, whereas the RMSE 
cannot be larger than n. It follows that any phase esti- 
mate must be biased for sufficiently small A7V. In fact, 
one can show that covariant phase measurements can- 
not be unbiased for every phase shift value, in the sense 
needed for the QCRB and HHB. 

In particular, when considering the RMSE with refer- 
ence phase <f) ri A^ r $, one needs to define the bias func- 
tion 



b<t>M ~ ($7 -4, 



(108) 



with 

($)$ r ■= f* 1+ dUp{4>\4>)- (109) 

J (f) r —7T 

Then the CRB with bias is [H 



(A» 2 > 



[i + ^W] s 

mF c ((f>) 



KM* 



(110) 



The QCRB and HHB in Eq. (|107|) similarly generalise 
(see also (5l|). 

If one is to use the form of the CRB without bias, 
then one needs b r f >r ((p) = and &W<£) = 0. This is highly 
problematic if one is to consider the full range of values of 
cf> with a fixed reference phase <f> r . This is because ($)^ r 
would need to change discontinuously at <p = <f> r + tt. 
But, for finite (N), it is easily shown that ($)^ r is a 
continuous function of 4> (see Appendix [F]). Therefore it 
is not possible for the phase to be globally unbiased unless 
(N) is infinite. Moreover, there must be a region of size 
scaling as 1/ (N) where the measurement is biased [this 
follows from Eq. (IF14j) ]. 

On the other hand, one can consider applying the CRB 
to A $ in Eq. ©; that is, to the RMSE modulo (-7T, tt]. 
Because A^ < A^$ (see Sec. II A), using the CRB to 
bound A^l> also yields a bound on A°$. Also, because 
A^l 1 = A^$, the conditions for the measurement to be 
unbiased become b^,(4>) = and b'^{4>) =0. It is impor- 
tant to note that 6^(0) is not the same as -^gb^{4>). In 
fact, the restriction 4zb = implies 



0= Wr b * M 
= 2-Kp((j) + 7r| 



+ b' <f> (4>) 
b'M- 



(111) 



That is, if b^(4>) = 0, then -^b^(4>) will automatically be 
zero, but 6^,(0) will only be zero if p(<fi + 7r|0) = 0. In 
fact, for bj,{4>) = 0, the condition b'A(f>) = is equivalent 
to p(<f> + n\<f>) = 0. 



The conditions for the measurement to be unbiased 
(when applying the CRB to the RMSE modulo (— w, tt]) 
can therefore be given as b^cf) = and p(<p + w\4>) = 0. 
The condition b^,(<p) = can be satisfied relatively eas- 
ily, because it will be satisfied whenever the probability 
distribution for the error in the phase estimate is sym- 
metric, so p((p + 9\<fi) = p((f> — 6\4>). However, it is not 
possible to satisfy p(cj> + 7r|0) = for all <\> when (N) is 
small. This is also the parameter regime where the HHB 
without bias must break down, because it would predict 
an impossibly large uncertainty. 

Hence, the bias of a given estimate is crucial in any 
application of the Cramer-Rao bound to the RMSE. This 
is in strong contrast to the Heisenberg-type bounds for 
the RAMSE derived in this paper, which are independent 
of the bias function. 



B. Asymptotic achievability 

It is often stated that the CRB (and QCRB and HHB) 
is asymptotically achievable in the limit of many probe 
states, without any further qualification. However, for 
example, it is important to note from Eq. (|110jl that, in 
the asymptotic limit m — > oo, the RMSE does not ap- 
proach zero for a biased estimate — it is always bounded 
below by \b^, r {4>)\. 

Furthermore, Fisher's theorem that Eq. (|110[) is itself 
asymptotically achievable, as m —> oo, does not hold 
in all cases of physical interest (Hj]. In particular, this 
theorem assumes that there is a unique maximally likely 
estimate (47|- However, this is not the case for many 
states considered in quantum phase estimation, including 
the NOON states as per Eq. (|97|) . which are the states 
that minimize the QCRB. The reason is of course that 
there is nothing to distinguish phase shifts modulo 2n/n, 
regardless of the number of samples, unless the phase 
shift is in fact already known to this accuracy. That 
is, there arc n maximally likely estimates, so Fisher's 
theorem does not apply. 

In contrast, the above qualifications do not apply to 
the Heisenberg-type bounds for the RAMSE derived in 
this paper, which are independent of the bias of the es- 
timate, and which are asymptotically achievable in the 
sense described in Sec. VI. 



C. Example 

There are obvious phase estimates that are not unbi- 
ased, where the RMSE obtained is qualitatively differ- 
ent from what would be expected from the Cramer-Rao 
bound without correcting for bias. Consider a simple 
measurement with a single photon in a MZI, in the state 

^ = ^|0)(0| + |l)(l|) + ^(^|0)(l|+ e ^|l)(0|), (112) 
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with visibility v < 1. The photon-counting measurement 
at the output of the interferometer gives probabilities of 
measurement results 



P(±\ 



(livcos 4>)/2. 



(113) 



For the + measurement result, the optimal (least-squarc- 
error) estimate is = 0, and for the — measurement re- 
sult the optimal estimate is = n. With these estimates, 
the RMSE is given by 

A <l = y/(j) 2 (l + vcos(f>)/2 + (tt- |0|) 2 (1 - ucos0)/2. 

(114) 

The absolute value of is taken above to take account of 
the fact that the difference should be determined modulo 
2t:. For v = 1, the error is zero at = and = n. 

In contrast, using the inverse square-root of the Fisher 
information (as for the CRB without correcting for bias) 
would give the lower bound 



cos- 



(115) 



In the limit v — > 1, the uncorrected CRB gives a result 
exactly equal to 1 . This is already greater than the actual 
RMSE for some 0. For imperfect visibility, the contrast 
is even stronger. The uncorrected bound diverges at = 
and 7r, even though the actual measurement error is 
a minimum there. This result is illustrated in Fig. [4] 
It is therefore clear that the CRB can give completely 
misleading results if it is not corrected for bias. On the 
other hand, correcting the CRB for bias, via Eq. (|110p . 
yields a bound exactly equal to the RMSE (|114p . 

The QCRB in Eq. (|107p . which assumes zero bias, gives 
1/v, and is also violated near = and = ±7r by 
the biased measurements considered here. Similarly, the 
HHB in Eq. (|107p yields a lower bound of 1 (the same 
as the QCRB for v = 1), which is also violated by the 
biased measurements considered here. Thus we can see 
that caution needs to be employed in using Eq. (|107p . be- 
cause it requires unbiased measurements. Such measure- 
ments are impossible in some cases, and even reasonable 
measurements can give highly biased estimates, resulting 
in a violation of the QCRB and HHB in Eq. (fT07|) . 

It is also interesting to compare these results to the er- 
ror propagation formula, which is often used to estimate 
the measurement error. The error propagation formula 
leads to the estimate of the error (using measurement 
operator X = |0)(1| + |1)(0|), 



{X)l 



y/l — V 2 COS 2 i 

ul sin 01 



(116) 



Thus the error propagation formula gives an estimate 
of the uncertainty that is identical to the uncorrected 
Cramer-Rao bound. 




FIG. 4: MSE for phase measurements with a single photon 
using an interferometer with visibility v = 0.99 and photon 
counting at the outputs. The actual MSE, as well as the 
Cramer-Rao bound with the correction for bias, is given by 
the solid curve (black). The uncorrected Cramer-Rao bound, 
as well as the estimate given by the error propagation formula, 
is given as the dashed curve (dark blue) . The horizontal dash- 
dotted line (green) is the conjectured bound on the AMSE, 
kc/(N + l) 2 . The horizontal solid line (light blue) is the 
actual AMSE for these measurements (obtained by averaging 
the MSE over 0), and the horizontal dotted line (red) is the 
Helstrom-Holevo bound, which in this case is only slightly 
smaller than the quantum Cramer-Rao bound of l/v. 



IX. CONCLUSIONS 

We have rigorously proven that the square root of the 
average mean-square error (RAMSE) of phase measure- 
ments is lower bounded by the Heiscnbcrg limit k/(G+l) 
[Eq. JTJ]. The inequality with k = kA ~ 0.56 holds in the 
case where the generator of the phase shifts has nonneg- 
ative integer eigenvalues. We obtain a very similar result 
in the case where G also has negative integer eigenvalues. 
The result is as in Eq. (|46|) . where the absolute value of 
G is used, and the scaling constant is again kA- 

These results mean that the accuracy of super- 
Heisenberg measurement schemes is essentially illusory. 
They may work for a small range of phases, but if one 
considers the additional resources needed to locate an 
unknown phase to within the required range, the overall 
measurement will not violate the Heisenberg limit. 

A new feature of our form of the Heiscnbcrg limit is 
that it holds for all (G) , not just in the asymptotic limit of 
large (G) . We achieve this by adding 1 to the denomina- 
tor. This modification is necessary, because otherwise the 
inequality would indicate that the error must approach 
infinity in the limit (G) —> 0. This is impossible because 
phase has a bounded range. 

As well as the analytical result stated above, we have 
very powerful evidence for a stronger bound with kA re- 
placed by kc ~ 1-38. We have provided extensive numer- 
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ical evidence that the inequality holds with this larger 
scaling constant, both in the case of the RAMSE and for 
the square root of the Holevo variance. In the case where 
G has negative eigenvalues, the numerical results indicate 
that Eq. ([46]) holds with the scaling constant k' c ps 0.79, 
which is again larger than ItA- 

These stronger lower bounds are also supported by 
asymptotic expansions of the exact solution for minimal 
Holevo variance, both for generators with nonncgative 
eigenvalues and generators without this restriction. A 
similar result for the RAMSE has also been obtained, 
via an asymptotic expansion of a lower bound for this 
quantity, for the case of a generator that is not restricted 
to nonnegative eigenvalues. The case where the eigen- 
values of G are restricted to nonnegative eigenvalues is 
a possible area for future study. The asymptotic expan- 
sions also enable us to show that these stronger lower 
bounds are asymptotically achievable. That is, the min- 
imum RAMSE is equal to the lower bounds to leading 
order. 

We showed how various schemes that have been pro- 
posed to break the Heisenberg limit do not break our 
bound on the RAMSE. The primary reason for this is 
that they can only violate the Heisenberg limit scaling 
if the phase shift is already known, not when averaging 
over the phase shift. Another factor is that they typically 
consider the Cramer- Rao bound, which is problematic for 
the Heisenberg limit. It cannot provide a nontrivial lower 
bound for fixed mean photon number, as is required for 
the Heisenberg limit. In addition, it requires knowledge 
of the bias. Our alternative approach circumvents these 
limitations. 

Our bound also differs from the Cramer-Rao bound 
in that it scales as 1/m in the number of copies of the 
state. The Cramer- Rao bound scales as 1/s/m, but we 
have found that such scaling is impossible for a fixed 
(N) . This indicates that it is fundamentally impossible to 
obtain the Heisenberg limit from the Cramer-Rao bound. 



Acknowledgments 

DWB is funded by an ARC Future Fellowship 
(FT100100761). MJWH, MZ, and HMW are supported 
by the ARC Centre of Excellence CE110001027. 



Appendix A: Details for the proof of Lemma 2 

There are two results used in the proof of Lemma [H 
which are proven here. First, we show that for this opera- 
tor the distribution of G^ is the same as the distribution 
of G for pq. The normalisation condition for the covariant 
measurement gives 

1= f d9e~ lGe M e lGe , (Al) 



so 

5 n ,n'5d,d' = Jd9 (n,d\e- lGe M Q e lGe \n\d'} 

= J d0e^ n '- n ^(n 7 d\M o \n\d') 

= 2n5 ntn ,{n,d\Mo\n',d'). (A2) 

This means that (n,d\Mo\n,d') = Sd t d'/2ir. Then, eval- 
uating the distribution for G^ s ' gives 

(n|p5 V) = 27r (n,d'\p \n,d) (n,d\M \n,d') 

d,d'<D(n) 

= 2n (n,d\p Q \n 1 d')(27T)- 1 S dA , 

d,d'<D(n) 

= Tr(p P n ), (A3) 

where P n := ^2 d \n,d)(n,d\ denotes the projection onto 
eigenvalue n of G. The expression in the last line is the 
distribution of G for po- 

(s) 

Second, we must show that p^ is positive and has 
trace one, and is therefore a valid density operator. Note 
one can always write the positive operators po and M as 
sums of (not necessarily normalised or orthogonal) kets: 

Po = 5>)<A|, Mo = 5>)H (A4) 
Hence, for any state \tp) = J2n ^n|^)j 

n,n'eS,d<D(n),d'<D(n') 

x (n',d'\po\n,d){n,d\M \n',d') 
= 2ttJ2\XxJ 2 >0, (A5) 

A , fx 

where 

X x>ll := Yl MM^d)(n,d\fi). (A6) 

n£S,d<D(n) 

Hence p^ > 0, as required. Summing Eq. (|A3j) over n 
yields Tr(^ s) ) = 1, so p^ is a valid density operator. 

Appendix B: Details for Eq. (45) 

Here we give the details of the derivation of Eq. (|45l) . 
First, variation of Eq. (|44|) gives the optimising distribu- 
tion 

Pn = e -(«+i) e -^l"-sl ) (Bl) 

which is the double exponential (Laplace) distribution 
found in [2^]. In Ref. [23| it was assumed that g was an 
integer, but here we consider the more general case that 
g is an arbitrary real number. 
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We require (3 > in order for the distribution to be 
normalisable. Then normalisation gives the restriction 



e -Pr + e -P(i-r) 



= e Q+1 , 



(B2) 



1-e-P 

where r = \g~\ — g. We then find that the mean value is 

( e Pr + e -^)(i _ r ) + ( e m-r) + e -m~r)y 



(\G-g\) 

and 



(1 - e-P)(eP r + e^ r )) 



(B3) 



H(G) = (a + l)+(3(\G-g\) 

-\+P{\G-g\). (B4) 



1 



,-P 



Without loss of generality we take r E [0,1/2]. The 
problem is symmetric about r = 1/2, so these results 
also apply to r > 1/2. Then we obtain 

-Pr + e -P(i~r)- 



2(|G-s-|) + l-exp 



j8r + In 



1-e-P 

(e' 3 -e 2 ' 9l ')[l + e 2 ^ + 2r(e' 9 -l)] 



{eP - i)(eP + e 2 ? r ) 
This then yields 

,-Pr i P -P(l-r) ' 



> 0. 



(B5) 



[3r + In 



<ln(2(|G- 5 |) + l). (B6) 



It can be shown that 

e p -{1 + 13)- e 2/3r [e- /3 (l + 2[3r) +[3-1- 2(3r] 
= (l + (3r- (3(\G - g\))(l - + e 2 ^). (B7) 

Next, for (3 > and r > we have 
< (3{l - 2r) 2 +4r 
= 1 + j8(l - 2r) - [1 + 2(/3 - 2)r - 4/3r 2 ] 
< e ' 3(1 - 2r) - [1 + 208 - 2)r - 4^r 2 ]. (B8) 
This means that 

^ {e" - (1 + /3) - e 2/?r [e^(l + 2/3r) + /3 - 1 - 2/3r] } 

= (1 - e~ p )e 2pr {e^ 1 - 2 -) - [1 + 2(/3 - 2)r - 4/3r 2 ]} 

> (B9) 

for f3 > 0. This implies that 

e -(l + f3)-e 2fir [e-* 3 (l + 2(3r)+(3-l-2f3r} > 0, (B10) 

for /? > 0, because the left-hand side is zero for f3 = 0, 
and has positive slope for [3 > 0. Using Eq. (|B7j) . this 
gives 

-{3r + f3{\G-g\)<l. (Bll) 

Now adding Eqs. JBfiJ) and (|BTT|) yields 

e -£r + e -/J(t-r)> 



In 



,-P 



+(3(\G-g\) <ln(2(|G-.g|) + l) + l, 

(B12) 

and substitution into Eq. (|B4[) gives Eq. (j4"5j) as required. 



Appendix C: Inequality proofs 

To prove Q 2 < fz{6), consider the function 

A(0) = ^(ir 2 / 2) (I -cos 9) - (tt 2 /4- 1)(1 -cos20)/2 
-0. (CI) 

Taking the derivative with respect to and solving to find 
the turning points of A(0) yields only two in the range 
[0,7r]. One is at 6 = 0, and the other is at 9 ps 2.23. As 
A(0) > for 2.23, and A(0) = for = or tt, we 
have A(0) > for g [0,tt]. This proves Eq. {79]) for 
€ [0,7r], and the result for g [— 7r, 0] follows because 
Eq. (|79p is symmetric. 

Next, to prove a lower bound on S$, we use Eq. (|20l) . 
which was 



(cos 6) > cos V(© 2 )- 
Using this, we have 



(C2) 



= ^(6 2 ) > arccos(cosG) = arccos[l - (<5i$) 2 /2]. 

. (C3) 

Now note that, if we have a state that minimises then 
it can not give a value of arccos(l — ((5i$) 2 /2) smaller 
than that for the minimum value of This means 

that we can lower bound (5$ by the minimum value of 
arccos[l — (<5i$) 2 /2]. This is a tighter lower bound on 
<5<i> than 6i$, because arccos(l — x 2 /2) = x + x 3 /2A + 
0(x 5 ). Unfortunately arccos[l — (<5i<l>) 2 /2] can still be 
below k c /{N+l). 



Appendix D: Asymptotic behaviour for Holevo 
variance 

Here we derive the asymptotic results for the variance 
given in Sec. ED Using Eq. (12) of Ref. [H (with q = 
p + 1), one has 



fc=i 



Jx+k{z) J x +k+i(z) = - [Jl +1 (z) - J x (z) J x +2(z)] 

(Dl) 



where the second equality follows from Eq. ([73 
Second, from Eq. (32) of Ref. [52|, one has 



k=l 



dJ x (z) dJ x+1 (z) 

Jx + l(Z) ~ Jx{Z) 



dx 
dJ x {z) 



dx 



(D2) 



2 J * +1{Z) dx 



where the second equality similarly follows via Eq. ([73 
We therefore have 



(A&x _ Jx+l(z) 



[dJ x (z)/dx) 



(D3) 
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Using Eq. fl77|) then yields 



z</ T +i (z) 
[dJ x {z)/dx\ 



(D4) 



There are a number of asymptotic results that we can 
use. From [H3, 01 



z = x + 7 x ±/J + 



i/ 3 , 3 7 2 , 5 - 7 3 479 7 4 + 20 7 



lOirVa 350a; 63000x 5 / 3 



20231 7 5 - 275507 2 
8085000a: 7 /3 



+ 0(x" d ) 



(D5) 



where 7 = lz^l/2 1 / 3 , and za is the first zero of the Airy 
function. We can invert this relation to give 



x = z — 72 



1/3 



7 



30z x /3 350z 
73769 7 5 - 33124507 2 



5 - 7 3 281 7 4 - 52207 



567000z 5 / 3 



654885000z 7 /3 



2 l/3 



+ 0(z- 3 ). 



92/3 00 
VAi'(-2^)£ 





(D6) 




y> fk(y) 

/ * ™2k/3 

fc=l 


) 


■ 9k(y) 

■ x 2k/3 ' 


(D7) 



The functions are given by 

h(y) = -\y, 

J3(y) - 7QQQ y 3l5Q y 225 , 



U(y) 



27 10 23573 7 5903 4 



(D8) 

(D9) 

(D10) 
947 



'J! 



20000" 147000° 138600" 346500" 

(Dll) 

3o(2/) = y^A (D12) 



17 3 1 

9 7 611 4 37 
92{y) = -TT^y + ^TTKV 



1000" 



3150" 3150 



y, 



549 o 110767 * 79 2 
y3vy; 28000 tf 693000^ 12375 y 

Using Eq. (|D5[) we have 

y = (z- x)/x 1/3 

3 7 2 5 - 7 3 479 7 4 + 2O7 



(D13) 
(D14) 
(D15) 



7 + 



10a; 2 / 3 350a; 4 / 3 63000a; 2 
202317 5 - 275507 2 



8085000a; 8 / 3 



+ 0{x- 6 ). 



(D16) 



We can then substitute Eq. (|D6[) . which gives 

3 7 2 5 + 69 7 3 9361 7 4 + 11807 



y-7 



IO2 2 / 3 350z 4 / 3 63000z 2 
8691349 7 5 + 1484550 7 2 



72765000,*/ 3 + ° ( ^ } ' (D17) 
Now to take the derivative with respect to the order, 
we can use 

£-J x (z) = ^-J x (x + yx 1 ' 3 ) + ^-^-J x {x + yx 1/3 ) 
ox ax ax ay 



— J x {x + yx 1/3 ) 



d 



3a; 4 / 3 3a; 1 / 3 I dy 



J x (x + yx 1/3 ) 



(D18) 



In the resulting expression it is possible to expand in a 
series for y about 7, then expand in a series about z. 
It is possible to determine a series in z for 



zJ x+1 (z) 
[dJ x {z)/dx] 



(D19) 



We can then invert this series, finding a series in (N + 1) 
for z. Similarly, it is possible to find a series in z for 



Jx+l{z) 

[dJ x (z)/dx] 



Then we can express 



Jx+l(z) 

[dj x {z)/dx\ 



-1. 



(D20) 



(D21) 



as a series in z. Substituting the series for z in (TV + 1), 
the overall result is as in Eq. ([75]). with 



(D22) 
(D23) 
(D24) 
(D25) 



b 2 = = k% « 1.8936, 

16UaI 6 

h = -w- * 2J514 < 

16|^(27 + 40M 3 ) 
&6 _ 688905 U ' 

= 256|^|»(3 + |„| 3 ) 
8 4428675 

_ 64|z A | 9 (2673 + 9252|z A | 3 + 1120|^| 6 ) ^ 
10 21483502425 

(D26) 

Next, to place an upper bound on the RAMSE for this 
state, we use Eq. (|79|) . This equation gives 

<0 2 ) < (tt 2 /2)(1- (cos6)) -(tt 2 /4-1)(1- (cos 26))/2. 

(D27) 

To find the expectation value (e l2e ), we use 
00 

(e 120 ) = A 2 Jx+n+l(z) J x+ n+3(z) 

n=0 

_ Efell Jx+k(z) Jx+k+2(z) 



(D28) 
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Again using Eq. (12) of Rcf. [52|, but now with q = p + 2, 
wc have 



^ Jx+k{z) J x +k+2(z) 



- [J x+ i(z)J x+2 (z) - J x (z) J x+3 (z)] 

2, 

-J x+1 (z)J x+2 (z). 



That then gives 



i20\ _ Jx+2\Z) 



(D29) 



(D30) 



x ' 2[dJ x (z)/dx] 

Using this expression, and expanding the Bessel func- 
tion solution in a series as above, we obtain 

(9 2 ) < (tt 2 /2)(1- (cos9)) - (tt 2 /4- 1)(1- (cos29))/2 
A\z A \ 3 , (tt 2 -4)M 3 x Q ( 1 



27(7V + 1) 2 54(iV + l) 3 



This results in the inequality given in Eq. ([80 
Expanding in a series also gives 



(N + iy 



(D31) 



(arccos(cos 0)) 



A\z A \ 3 W\z A \ 6 



27(N+l) 2 10935(7V+1) 4 
1 



O 



(D32) 



Using Eq. (|C3|) . we therefore have the upper and lower 
bounds on (<5$) 2 , 



(N + l) 



(N 



< 



(iV + 1) 2 



(iV + 1) 3 



(D33) 



Appendix E: Asymptotic behaviour for variance 
with flxed (\J\) 

The normalization constraint yields, using Eq. (32) of 
Ref. [13, 

OO 

A- 2 = [J x (z)} 2 + 2^[J X+J -( Z )] 2 



[■4(z)] 2 +Z 



Jx +1 (,)^-J,(,)^ +l(2) 



(9,r 



(El) 



We also have 

OO 

(|J|)=2A 2 ^j[J,+iW] 2 , 



(e ie ) = A 2 



2J x (z) J x +i{z) + 2 J x+ j(z) J x +j+ 

i(> 



(E2) 



Using Eq. (12) of [52| we have 

(e lQ ) = A 2 [2J x (z) J x+1 (z) + zJ 2 +1 (z) 

-zJ x {z)J x+2 {z)]. (E3) 

In this case we have 

(j Q ) = {a + ()(\J\)) = {x + (\J\))/z, (E4) 

so 

(\J\)=z(e i9 )-x. (E5) 

The first zero of the derivative of the Bessel function 
is given by [H, l54l | 



z = x + 'fx 16 + ' 



1 



1 



7 



13 



1 



10 loy ) x 1 / 3 
1 \ 1 



350 25 2OO7'' 
958 7 ' 9 - 2036 7 ' 6 - 84 7 ' 3 + 63 
126000 7 ' 5 x 5 / 3 



+ 0(x~ 7/3 ). (E6) 



We have corrected an error in Ref. [53j where "840" was 
given instead of "84". Here 7' = \z' A \/2 1 ^ 3 : where z' A is 
the first zero of the derivative of the Airy function. In- 
verting this series, and performing series expansions sim- 
ilar to that for the first case, gives the series in Eq. ([86]) 
with 



d 2 
d 3 

d 5 
d 6 



27 
27 



k' c « 0.6266, 
1.2533, 



16|^| 3 (111-4|4| 3 ) 



1215 

64|z^| 3 (21-4|^| 3 ) 



« 1.4868, 
0.9341, 



1215 

16|z^| 3 (-63 - 40488|^| 3 + 160\z' A 
1148175 



' \6\ 



(E7) 

(E8) 

(E9) 

(E10) 

-0.6292. 
(Ell) 



To determine an upper bound, we need to determine 
{e a °)=A 2 [J 2 +1 (z) + 2J x (z)J x+2 (z) 

OO 

+2^2.J X+J {z) J x+J+2 (z) 

= A 2 {J 2 +1 (z) + 2J x (z) J x+2 (z) 

+ ^[J x+1 (z)J x+2 (z) - J x (z) J x+3 (z)}} . (E12) 

Expanding in a series then gives 



(© 2 >< 



16|^| 3 



32|^| 3 



27(2|J| + 1) 2 27(2|J| + 1) 3 
1 



O 



(2|J| + 1) 4 



(E13) 



21 



Therefore, the MSE is upper and lower bounded as 

1.1 2 



'C 



(2|J| + 1> ! 



O 



1 



< 



(2\j\ + iy 



o 



(2|J| + 1)3 
1 



(2|J| + 1)3 



(E14) 



Appendix F: Continuity of the expected phase 
estimate 

Here we show that ($)^ r is a continuous function of (f>. 
Defining 



(Fl) 



we have 

(*) J" = TriX^pt). (F2) 
The expectation value of the phase estimate at <j> + e is 
($)^ £ =Tr(^e- jG V e lGe ). (F3) 
The difference is 

K*)ft- e - - |Tr[^( e - i;G V0e 8Ge - ^)]|. (F4) 

Take to be the eigenbasis of .X^. Then 

m\x^)\ < [ r+ * dMti&iM^j) 

J 4> r — 7T 

<2tt [ ^ dt&lM^j) 

J (fc r — TT 

= 27r&|l|&>=27r. (F5) 

Using this, we find 

|Tr[X 0r (e- G V0e <Ge -p )]| 

= E&i x ^>^ e " Ge ^ Ge - ^)&> 

< 2tt^ |(^|(e- 8Ge p e iGe - p^)\ . (F6) 
i 

Take \Q) to be the eigenbasis of e~ e p$e — p$. Then 

El^i( e " iGe ^ eiGe -^)i^)l 

3 

= E E i & ico i 2 (a i (e- iGe ^e iGe - ^) ia> 
j k 

<EKc fc |(e- 4G v^ Ge -^)ia)| 



|e- lG V0e 4Ge 



i- 



(F7) 



Hence 

|Tr[X 0r (e- tGe pe 4G£ - P4> )]\ < 2^|| e - lGe ^e lGe - p^a . 

(F8) 

Take the state to be given by 



P4> = ^2Pj\i>j){i>j 



(F9) 



For \ipj), 



-ike 



<^|e-* G U')=ENM 2e 

k 

= i-Ei^ fc i 2 ( 1 - e 

Evaluating the distance from 1 gives 



— ike \ 



(F10) 



£hM a (i 



Ake ^ 



<£nm 2 ii-< 

k 

<£hM 2 M 

k 



-ike I 



(Fll) 



so 



D(\^),e- iGe \^)) = 2yjl m\e-^W 

<2^/l-\l-(i, J \\GU J )\e\\2 
< 2^(«S. (F12) 

By the convexity of trace distance 



\e-' lGe p^e lGe 



P0iii<E^^(i^)' e_lGe i^)) 



< 



E jW2<^-||G||^-)N 



(F13) 



where (|G|) = Tr(|G|p). Hence 



mf +e -mt\<^V2(MWl (F14) 



Thus the expectation value of the phase estimate must 
be a continuous function of <p unless (\G\) is infinite. 
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